虽然最先进的视觉变压器模型实现了图像分类的有希望的结果,但它们是非常昂贵的并且需要许多GFLOPS。尽管可以通过减少网络中的令牌数量来降低视觉变压器的GFLOPS,但是没有对所有输入图像的最佳设置。因此,在这项工作中,我们引入了可分辨率的无参数自适应令牌采样(ATS)模块,可以插入任何现有的视觉变压器架构。通过评分和自适应采样重要令牌,在视觉变压器上实现视觉变压器。结果,令牌的数量不再静态,但是每个输入图像都变化。通过将ATS集成为当前变压器块内的附加层,我们可以将它们转换为具有自适应令牌的更高效的视觉变压器。由于ATS是一种无参数模块,因此它可以作为即插即用模块添加到从货架上的预制视觉变压器中,从而在没有任何额外训练的情况下减少他们的GFLOP。但是,由于其可分辨动的设计,人们还可以培训配有ATS的视觉变压器。通过将其添加到多个最先进的视觉变压器,我们在想象成数据集上进行评估。我们的评估表明,通过将计算成本(GFLOPS)降低37%,在保留准确性时,该模块通过降低了37%,提高了最先进的模块。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Aiming at highly accurate object detection for connected and automated vehicles (CAVs), this paper presents a Deep Neural Network based 3D object detection model that leverages a three-stage feature extractor by developing a novel LIDAR-Camera fusion scheme. The proposed feature extractor extracts high-level features from two input sensory modalities and recovers the important features discarded during the convolutional process. The novel fusion scheme effectively fuses features across sensory modalities and convolutional layers to find the best representative global features. The fused features are shared by a two-stage network: the region proposal network (RPN) and the detection head (DH). The RPN generates high-recall proposals, and the DH produces final detection results. The experimental results show the proposed model outperforms more recent research on the KITTI 2D and 3D detection benchmark, particularly for distant and highly occluded instances.
translated by 谷歌翻译
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).
translated by 谷歌翻译
With the wide-spread application of machine learning models, it has become critical to study the potential data leakage of models trained on sensitive data. Recently, various membership inference (MI) attacks are proposed that determines if a sample was part of the training set or not. Although the first generation of MI attacks has been proven to be ineffective in practice, a few recent studies proposed practical MI attacks that achieve reasonable true positive rate at low false positive rate. The question is whether these attacks can be reliably used in practice. We showcase a practical application of membership inference attacks where it is used by an auditor (investigator) to prove to a judge/jury that an auditee unlawfully used sensitive data during training. Then, we show that the auditee can provide a dataset (with potentially unlimited number of samples) to a judge where MI attacks catastrophically fail. Hence, the auditee challenges the credibility of the auditor and can get the case dismissed. More importantly, we show that the auditee does not need to know anything about the MI attack neither a query access to it. In other words, all currently SOTA MI attacks in literature suffer from the same issue. Through comprehensive experimental evaluation, we show that our algorithms can increase the false positive rate from ten to thousands times larger than what auditor claim to the judge. Lastly, we argue that the implication of our algorithms is beyond discredibility: Current membership inference attacks can identify the memorized subpopulations, but they cannot reliably identify which exact sample in the subpopulation was used during training.
translated by 谷歌翻译
对比度学习是视觉表示学习最成功的方法之一,可以通过在学习的表示上共同执行聚类来进一步提高其性能。但是,现有的联合聚类和对比度学习的方法在长尾数据分布上表现不佳,因为多数班级压倒了少数群体的损失,从而阻止了学习有意义的表示形式。由此激励,我们通过适应偏见的对比损失,以避免群集中的少数群体类别的不平衡数据集来开发一种新颖的联合聚类和对比度学习框架。我们表明,我们提出的修改后的对比损失和分歧聚类损失可改善多个数据集和学习任务的性能。源代码可从https://anonymon.4open.science/r/ssl-debiased-clustering获得
translated by 谷歌翻译
通常根据历史崩溃数据来实践道路的风险评估。有时缺少有关驾驶员行为和实时交通情况的信息。在本文中,安全的路线映射(SRM)模型是一种开发道路动态风险热图的方法,可扩展在做出预测时考虑驾驶员行为。 Android应用程序旨在收集驱动程序的信息并将其上传到服务器。在服务器上,面部识别提取了驱动程序的数据,例如面部地标,凝视方向和情绪。检测到驾驶员的嗜睡和分心,并评估驾驶性能。同时,动态的流量信息由路边摄像头捕获并上传到同一服务器。采用基于纵向扫描的动脉交通视频分析来识别视频中的车辆以建立速度和轨迹概况。基于这些数据,引入了LightGBM模型,以预测接下来一两秒钟的驾驶员的冲突指数。然后,使用模糊逻辑模型合并了多个数据源,包括历史崩溃计数和预测的交通冲突指标,以计算道路细分的风险评分。使用从实际的交通交叉点和驾驶模拟平台收集的数据来说明所提出的SRM模型。预测结果表明该模型是准确的,并且增加的驱动程序行为功能将改善模型的性能。最后,为可视化目的而生成风险热图。当局可以使用动态热图来指定安全的走廊,并调度执法部门以及驱动程序,以预警和行程计划。
translated by 谷歌翻译
从积极和未标记的(PU)数据中学习是一种设置,学习者只能访问正面和未标记的样本,而没有关于负面示例的信息。这种PU环境在各种任务中非常重要,例如医学诊断,社交网络分析,金融市场分析和知识基础完成,这些任务也往往本质上是不平衡的,即大多数示例实际上是负面的。但是,大多数现有的PU学习方法仅考虑人工平衡的数据集,目前尚不清楚它们在不平衡和长尾数据分布的现实情况下的表现如何。本文提议通过强大而有效的自我监督预处理来应对这一挑战。但是,培训传统的自我监督学习方法使用高度不平衡的PU分布需要更好的重新重新制定。在本文中,我们提出\ textit {Impulses},这是\ usewanced {im}平衡\下划线{p} osive \ unesive \ usepline {u} nlabeLed \ underline {l}的统一表示的学习框架{p}。 \下划线{s}削弱了debiase预训练。 Impulses使用大规模无监督学习的通用组合以及对比度损失和额外重新持续的PU损失的一般组合。我们在多个数据集上进行了不同的实验,以表明Impuls能够使先前最新的错误率减半,即使与先前给出的真实先验的方法相比。此外,即使在无关的数据集上进行了预处理,我们的方法也表现出对事先错误指定和卓越性能的鲁棒性。我们预计,这种稳健性和效率将使从业者更容易在其他感兴趣的PU数据集上获得出色的结果。源代码可在\ url {https://github.com/jschweisthal/impulses}中获得
translated by 谷歌翻译
计算机视觉技术可以帮助自动化或部分自动化口面损伤的临床检查,以提供准确和客观的评估。为了开发此类自动化系统,我们评估了两种在口面评估视频中检测和时间分段(分析)重复的方法。从多伦多神经曲面数据集获得了患有肌萎缩性侧索硬化症(ALS)和健康对照(HC)个体的参与者的录制视频。检查了两种重复检测和解析方法:一种基于轨迹地标的工程特征和上嘴唇和下唇的朱红色 - 二连交界之间的距离(基线分析)的峰值检测(基线分析),另一种是使用预训练的变压器 - 基于repnet的基于深度学习模型(Dwibedi等,2020),该模型自动检测周期性,并在视频数据中解析周期性和半周期重复。在对两项口面评估任务的实验评估中 - 重复最大的口腔张开(打开)并重复“购买Bobby a Puppy”(BBP)(BBP) - repnet提供了比基于具有里程碑意义的方法更好的解析,并通过较高的平均相交量化的方法来量化。联合(IOU)关于地面真理手动解析。使用Repnet自动解析还根据BBP重复的持续时间清楚地分离了HC和ALS参与者,而基于里程碑的方法则不能。
translated by 谷歌翻译
通信网络中的时间延迟是通过边缘部署机器人的主要关注点之一。本文提出了一个多阶段的非线性模型预测控制(NMPC),该控制能够处理不同的网络引起的时间延迟,以建立控制框架,以确保无碰撞的无碰撞微型航空车(MAVS)导航。这项研究介绍了一种新颖的方法,该方法通过与现有的典型多阶段NMPC相反的离散化场景树来考虑不同的采样时间,在这种情况下,系统不确定性是由场景树建模的。此外,该方法根据通信链接中时间延迟的概率考虑了多阶段NMPC方案的自适应权重。由于多阶段NMPC,获得的最佳控制动作对于多个采样时间有效。最后,在各种测试和不同的模拟环境中证明了所提出的新型控制框架的总体有效性。
translated by 谷歌翻译